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[57] ABSTRACT 

In a distributed heterogeneous computer system having 
a plurality of computer nodes each operatively con- 
nected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, an improve- 
ment for dynamically reallocating the system’s re- 
sources for optimized job performance. There is first 
logic at each node for dynamically and periodically 
calculating and saving a workload value as a function of 
the number of jobs on the node’s queue. Second logic is 
provided at each node for transfering the node’s work- 
load value to other nodes on the network at the request 
of the other nodes. Finally, there is third logic at each 
node operable at the completion of each job. The third 
logic includes, logic for checking the node’s own work- 
load value, logic for polling all the other nodes for their 
workload value if the checking node’s workload value 
is below a preestablished value indicating the node as 
being underutilized and available to do more jobs, logic 
for checking the workload values of the other nodes as 
received, and logic for transfering a job from the queue 
of the other of the nodes having the highest workload 
value over a preestablished value indicating the other of 
the nodes as being overburdened and requiring job 
relief to the que of the checking node. The third logic is 
also operable periodically when the node is idle. 

16 Claims, 3 Drawing Sheets 
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DYNAMIC RESOURCE ALLOCATION SCHEME 
FOR DISTRIBUTED HETEROGENEOUS 

COMPUTER SYSTEMS 

5 

ORIGIN OF THE INVENTION 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 10 
title. 

TECHNICAL FIELD 

The invention relates to resource allocation in com- 
puter systems and, more particularly, to a method and 15 
associated apparatus for shortening response time and 
improving efficiency of a heterogeneous distributed 
networked computer system by reallocating the jobs 
queued up for busy nodes to idle, or less-busy nodes. In 
accordance with a novel algorithm, the load-sharing is 20 
initiated by the server device in a manner such that 
extra overhead is not imposed on the system during 
heavily-loaded conditions. 

BACKGROUND ART 25 

In distributed networked computer systems there is a 
high probability that one of the workstations will be idle 
while others are overloaded. Thus, the response times 
for certain tasks are longer than they should be if all the 
capabilities in the system could be shared fully. As is 30 
known in the art, the solution is to reallocate tasks from 
queues connected to busy computers to idle computer 
queues. 

As depicted in FIG. 1, a distributed computer system 
10 consists of several computers 12 with the same or 35 
different processing capabilities, connected together by 
a network 14 . Each of the computers 12 has tasks 16 
assigned to it for execution. In such a distributed multi- 
computer system, the probability is high that one of the 
computers 12 is idle while another computer 12 has 40 
more than one task 16 waiting in the queue for service. 
This probability is called the “imbalance probability”. 

A high imbalance probability typically implies poor 
system performance. By reallocating queued tasks or 
jobs to the idle or lightly-loaded computers 12 , a reduc- 45 
tion in system response time can be expected. This tech- 
nique is called “load sharing” and is one of the main foci 
of this invention. As also depicted in FIG. 1, such redis- 
tribution of the tasks 16 on a dynamic basis is known in 
the art. Typically, there is a control computer 18 at- 50 
tached to the network 14 containing task lists 20 . On 
various bases, the control computer 18 dynamically 
reassigns tasks 16 from the lists 20 to various computers 
12 within the system 10 . For example, it is known in the 
art to have each of the computers 12 provide the con- 55 
trol computer 18 with a indicator of the amount of 
computing time on tasks that is actually taking place. 
The control computer 18 , with knowledge of the 
amount of use of each computer 12 available, is then 
able to reallocate the tasks 16 as necessary. In military 60 
computer systems, and the like, this ability to reconfig- 
ure, redistribute, and keep the system running is an 
important part of what is often referred to as “graceful 
degradation”; that is, the system 10 continues to operate 
as best it can to do the tasks at hand on a priority basis 65 
for as long as it can. 

The inventors herein did a considerable amount of 
statistical analysis and evaluation of networked com- 
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puter systems according to the known prior techniques 
for load distribution and redistribution. Their finding 
will now be set forth by way of example to provide a 
clear picture of the background and basis for the present 
invention. 

The imbalance probability, IP, for a heterogeneous 
system can be calculated by mathematical techniques 
well known to those skilled in the art which, per se, are 
no part of the novelty of the present invention. There is 
a finite, calculatable probability that I out of N comput- 
ers comprising a networked system are idle. There is 
also a finite probability that all stations other than those 
I stations are busy, as well as a probability that there is 
exactly one job in each one of the remaining (N-I) sta- 
tions, i.e. a finite probability that at least one out of (N-I) 
stations has one or more jobs waiting for service. By 
summing over the number of idle stations, from I to N, 
the imbalance probability for the whole system can be 
obtained. By way of example, in a homogeneous system, 
all the nodes (i.e. computers 12) have the same service 
rate and the same arrival rate. As the number of nodes 
increases, the peak of the imbalance probability goes 
higher. As the number of nodes increases to twenty, the 
imbalance probability approaches I when the traffic 
intensity (arrival rate divided by the service rate at each 
node) ranges from 40% to 80%. The statistical curves 
also indicate that the probability of imbalance is high 
during moderate traffic intensity. This occurs due to the 
fact that all nodes are either idle (i.e. there is low traffic 
intensity) or are busy (i.e. there is high traffic intensity). 

If the arrival rate is not evenly distributed, the imbal- 
ance probability becomes even higher. In the imbalance 
probability of a two-node heterogeneous system, the 
faster node is twice as fast as the slower one and the 
work is evenly distributed. If the work is not balanced, 
it has been observed that the imbalance probability goes 
even higher during high traffic intensity at the slower 
node. At this point, the slower node is heavily loaded 
even though the faster node is only 50% utilized. 

Numerous studies have addressed the problem of 
resource-sharing in distributed systems. It is convenient 
to classify these strategies as being either static or dy- 
namic in nature and as having either a centralized or 
decentralized decision-making capability. One can fur- 
ther distinguish the algorithms by the type of node that 
takes the initiative in the resource-sharing. Algorithms 
can either be sender-initiated or server-initiated. Some 
algorithms can be adapted to a generalized heteroge- 
neous system while others can only be used in a homo- 
geneous system. These categories are further explained 
as follows: 

Static/Dynamic: Static schemes use only the infor- 
mation about the long-term average behavior of the 
system, i.e. they ignore the current state. Dynamic 
schemes differ from static schemes by determining how 
and when to transfer jobs based on the time-dependent 
current system state instead of the average behavior of 
the system. The major drawback of static algorithms is 
that they do not respond to fluctuations of the work- 
load. Dynamic schemes attempt to correct this draw- 
back but are more difficult to implement and may intro- 
duce additional overhead. In addition, dynamic 
schemes are hard to analyze. 

Centralized/Decentralized: In a system with central- 
ized control, jobs are assumed to arrive at the central 
controller which is responsible for distributing the jobs 
among the network’s nodes; in a decentralized system, 
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jobs are submitted to the individual nodes and the deci- 
sion to transfer a job to another node is made locally. 
This central dispatcher approach is quite restrictive for 
a distributed system. 

Homogeneous/Heterogeneous system: In the homo- 5 
geneous system, all the computer nodes are identical 
and have the same service rate. In the heterogeneous 
system, the computer nodes do not have the same pro- 
cessing power. 

Sender/Server Initiated: If the source node makes a 10 
determination as to where to route a job, this is defined 
as a sender-initiated strategy. In server-initiated strate- 
gies, the situation is reversed, i.e., lightly-loaded nodes 
search for congested nodes from which work may be 
transferred. 15 

The prior art as discussed in the literature (see Listing 
of Cited References hereinafter) will now be addressed 
with particularity. 

First, there are the static strategies. Stone [Ston 78] 
developed a centralized maximum flow algorithm for 20 
two processors (i.e. computer nodes) by holding the 
load of one processor fixed and varying the load on the 
other processor. Ni and Hwang [Hwan 81] studied the 
problem of load balancing in a multiple heterogeneous 
processor system with many job classes. In this system, 25 
the number of processors was extended to more than 
two. Tantawi and Towsley [Tant 85] formulated the 
static resource-sharing problem as a nonlinear program- 
ming problem and presented two efficient algorithms, 
the parametric-study algorithm and the load-balancing 30 
problem. Silva and Gerla [Silv 84] used a downhill 
queueing procedure to search for the static optimal job 
assignment in a heterogeneous system that supports 
multiple job classes and site constrains. Recently, 
Kurose and Singh [Kuro 86] used an iterative algorithm 35 
to deal with the static decentralized load-sharing prob- 
lem. Their algorithm was examined by theoretical and 
simulation techniques. 

Next, there are the dynamic strategies. Chow and 
Kohler [Chow 79] used a queueing theory approach to 40 
examine a resource-sharing algorithm for a heteroge- 
neous two-processor system with a central dispatcher. 
Their objective was to minimize the mean response 
time. Foschni and Salz [Fosc 79] generalized one of the 
methods developed by Chow and Kouler to include 45 
multiple job dispatchers. Wah [Wah 84] studied the 
communication overhead of a centralized resource- 
sharing scheme designed for a homogeneous system. 
Load-balancing of the Purdue ECN (Engineering Com- 
puter Network) was implemented with a dynamic de- 50 
centralized RXE (remote execution environment) pro- 
gram [Hawn 82]. With the decentralized RXE, the load 
information of all the processors was maintained in each 
network machine’s kernel. One of the problems with 
this approach is the potentially high cost of obtaining 55 
the required state information. It is also possible for an 
idle processor to acquire jobs from several processors 
and thus become overloaded. Ni and Xu [Ni 85] pro- 
pose the “draft” algorithm for a homogeneous system. 
Wah and Juang [Wah 85] propose a window control 60 
algorithm to schedule the resource in local computer 
systems with a multi-access network. Wang and Morris 
[Wang 85] studied ten different algorithms for homoge- 
neous systems to evaluate the performance differences. 
Eager, et al. [Eage 86] addressed the problem of decen- 65 
tralized load sharing in a multiple system using- dynam- 
ic-state information. Eager discussed the appropriate 
level of complexity for load-sharing policies and 
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showed that schemes that use relatively simple state 
information do very well and perform quite closely to 
the optimal expected performance. The system configu- 
ration studied by Eager, et al. was also a homogeneous 
system. Towsley and Lee [Tows 86] used the threshold 
of the local job queue length at each host to make deci- 
sions for remote processing. This computer system was 
generalized to be a heterogeneous system. 

In summary, most of the work reported in the litera- 
ture has been limited to either static schemes, central- 
ized control, homogeneous systems, or to two-proces- 
sor systems where overhead considerations were ig- 
nored. All of these approaches make assumptions that 
are too restricted to apply to most real computer system 
installations. The main contribution of this reported 
work is the development of a dynamic, decentralized, 
resource-sharing algorithm for a heterogeneous multi- 
ple (i.e. greater than two) processor system. Because it 
is server-initiated, this approach thus differs signifi- 
cantly from the sender-initiated approach described in 
[Tows 86]. The disadvantage of this prior art server- 
initiated approach is that it imposes extra overhead in 
the heavily-loaded situation and therefore, it could 
bring the system to an unstable state. 
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The foregoing articles and reports from the literature 
are only generally relevant for background discussion 40 
purposes and, since copies are not readily available to 
applicants for filing herewith, they are not being pro- 
vided. In addition to the foregoing non-supplied articles 
from the literature, however, copies of the following 
relevant U.S. Letters Patent are being provided here- 45 
with: 

[1] Hoschler, H., Raimar, W., and Bandmaler, K. 
“Method of Operating a Data Processing System,” U.S. 
Pat. No. 4,099,235, July 4, 1978. 

[2] Kitajima, H. and Ohmachi, K. “Processing Re- 50 
quest Allocator for Assignment of Loads in a Distrib- 
uted Processing System,” U.S. Pat. No. 4,495,570, Jan. 

22, 1985. 

[3] Fry, S. M„ Hempy, H. O., and Kittinger, B. E. 
“Balancing Data-Processing Workloads”, U.S. Pat. No. 55 
4,403,286, Sept. 6, 1983. 

With respect to the above-listed U.S. Patents and the 
teaching thereof vis-a-vis the present invention to be 
described hereinafter, the inventors herein have in- 
vented a new dynamic load-balancing scheme for a 60 
distributed computer system consisting of a number of 
heterogeneous hosts connected by a local area network 
(LAN). As mentioned above, numerous studies have 
addressed the problem of resource-sharing in distrib- 
uted systems. For purposes of discussion and distin- 65 
guishing, it is convenient to classify these strategies as 
being either static or dynamic in nature and as having 
either a centralized or decentralized decision-making 
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capability. One can further distinguish the algorithms 
employed by the type of node that takes the initiative in 
the resource-sharing. Algorithms can be either sender- 
initiated or receiver-initiated. Some algorithms can be 
adapted to a generalized heterogeneous system while 
others can be used only in a homogeneous system. 
These categories are further addressed with respect to 
the above-referenced prior art patents as follows. 

Centralized/Decentralized: In a system with central- 
ized control (as shown in FIG. 1) jobs arrive at the 
central control computer 18 which is responsible for 
distributing the jobs among the network nodes. In a 
decentralized system, jobs are submitted to the individ- 
ual nodes and the decision to transfer a job to another 
node is made locally. The central dispatcher approach is 
quite restrictive for a distributed system. In the teach- 
ings of their patent, Kitajima, H. and Ohmachi assign a 
processing request allocator to be the single controller 
of their centralized scheme. One of the problems with 
this approach is the potentially high cost of obtaining 
the required state information. It is also possible for an 
idle processor to acquire jobs from several processors 
and thus become overloaded. 

Homogeneous/Heterogeneous system: In a homoge- 
neous system, all computer nodes must be identical and 
have the same service rate. In the heterogeneous sys- 
tem, the computer nodes do not have the same process- 
ing power. In their patent, Fry, S. M., Hempy, H. O., 
and Kittinger, B. E. disclose a scheme to balance data- 
processing workloads on a homogeneous environment. 

Sender/Receiver Initiated: If the source node makes 
a determination as to where to route a job, this is de- 
fined as a sender-initiated strategy. In receiver-initiated 
strategies, the situation is reversed, i.e., lightly-loaded 
nodes search for congested nodes from which work 
may be transferred. In their patent, Hoschler, H., Rai- 
mar, W., and Brandmaler disclose a sender-initiated 
scheme. The inventors herein have proved that the 
receiver-initiated approach is superior at medium to 
high loads and, therefore, have incorporated such an 
approach in their invention in a novel manner. 

Static/Dynamic: Static schemes use only the infor- 
mation about the long-term average behavior of the 
system, i.e. they ignore the current state. Dynamic 
schemes differ from the static schemes by determining 
how and when to transfer jobs based on the time- 
dependent current system state instead of the average 
behavior. The major drawback of static algorithms is 
that they do not respond to fluctuations of the work- 
load. Dynamic schemes attempt to correct this draw- 
back. 

STATEMENT OF THE INVENTION 

Accordingly, an object of the invention is the provid- 
ing of a dynamic decentralized resource-sharing algo- 
rithm for a heterogeneous multi-processor system. 

It is another object of the invention to provide a 
dynamic decentralized resource-sharing algorithm for a 
heterogeneous multi-processor system which is receiv- 
er-initiated in heavy load so that it does not impose 
extra overhead in the heavily-loaded situation and, 
therefore, will not bring the system to an unstable state. 

Another object of the present invention is to prevent 
an idle node in a heterogeneous multi-processor system 
from becoming isolated from the resource-sharing pro- 
cess, as can happen with the systems of Fry and 
Kitajima by providing a wakeup timer used at each idle 
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node to periodically cause the idle node to search for a 
job that can be transferred from a heavily-loaded node. 

Still another object of the present invention is to use 
the local queue length and the local service rate ratio at 
each node as a more efficient workload indicator. 

It is yet a further object of the invention to provide a 
dynamic decentralized resource-sharing algorithm for a 
heterogeneous multi-processor system which dynami- 
cally adjusts to the traffic load and does not generate 
extra overhead during high traffic loading conditions 
and, therefore, cannot bring the system to an unstable 
state. 

The foregoing objects have been achieved in a dis- 
tributed heterogeneous computer system having a plu- 
rality of computer nodes each operatively connected 
through a network interface to a network to provide for 
communications and transfers of data between the 
nodes and wherein the nodes each have a queue for 
containing jobs to be performed, by the improvement of 
the present invention for dynamically reallocating the 
system’s resources for optimized job performance. 
There is first logic at each node for dynamically and 
periodically calculating and saving a workload value as 
a function of the number of jobs on the node’s queue. 
Second logic is provided at each node for transfering 
the node’s workload value to other nodes on the net- 
work at the request of the other nodes. Finally, there is 
third logic at each node operable at the completion of 
each job. The third logic includes, logic for checking 
the node’s own workload value, logic for polling all the 
other nodes for their workload value if the checking 
node’s workload value is below a pre-established value 
indicating the node as being underutilized and available 
to do more jobs, logic for checking the workload values 
of the other nodes as received, and logic for transfering 
a job from the queue of the other of the nodes having 
the highest workload value over a pre-established value 
indicating the other of the nodes as being overburdened 
and requiring job relief to the queue of the checking 
node. The third logic is also operable periodically when 
the node is idle. 

Other objects and advantages of the present invention 
will become apparent from the description which fol- 
lows hereinafter when taken in conjunction with the 
drawing figures which accompany it. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a simplified block diagram of a prior art 
distributed computer system employing a control com- 
puter to distribute and redistribute the tasks among the 
computer nodes on the network. 

FIG. 2 is a simplified block diagram of a distributed 
computer system according to the present invention. 

FIG. 3 is a functional block diagram of a computer 
from the system of FIG. 2 pointing out the portions 
thereof related to the present invention. 

FIG. 4 is a functional block diagram of the system of 
FIG. 2 showing the is which tasks are dynamically 
reallocated accord dual mode approach of the present 
invention. 

FIG. 5 is a flowchart of the logic of the method of the 
present invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The new and novel resource-sharing algorithm of the 
present invention, which the inventors call the Server- 
Initiated Dynamic Resource-Sharing Algorithm (SIDA 
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for short), will now be presented. A queueing analysis 
of the algorithm’s performance is presented and the 
result is validated by the reporting of simulation results. 
The basic environment is as depicted in FIG. 2; that is, 
5 there are a plurality of computer nodes 12' distributed 
across a network 14 without the need for a control 
computer 18 as in the prior art of FIG. 1. 

As observed earlier herein, most of the work of the 
prior art has been limited to either static schemes, cen- 
10 tralized control, or homogeneous systems. All of these 
approaches make assumptions that are much too restric- 
tive to apply to most real computer system installations. 
The main contribution of the present invention is the 
providing of a dynamic, decentralized, resource-sharing 
15 algorithm for a heterogeneous, multi-processor, com- 
puter system (such as that generally indicated as 10 ' in 
FIG. 2). The algorithm employed in the present inven- 
tion uses a dual-mode, server-initiated approach which 
is clearly novel over anything taught or suggested by 
20 the prior art. Jobs are transferred from heavily bur- 
dened nodes (i.e. over a high threshold limit) to low 
burdened (or idle) nodes at the initiation of the receiv- 
ing node when (1) a job finishes at a node which is 
burdened below a pre-established threshold level or (2) 
25 a node has been idle for a period of time as established 
by a wakeup timer at the node. The important advan- 
tage of this approach is that, unlike the prior art ap- 
proaches, it does not impose extra overhead in the 
heavily-loaded situation. Therefore, it cannot bring the 
30 system to an unstable state. The algorithm also has two 
important advantages over the prior art. First, to pre- 
vent an idle node from becoming isolated from the 
resource-sharing process, the wakeup timer is included. 
As will be addressed in greater detail shortly, the 
35 wakeup timer is used at each idle node to periodically 
cause the idle node to search for a job that can be trans- 
ferred from a heavily-loaded node. Second, this inven- 
tion uses a combination of the local queue length and 
the local service rate ratio at each node as the workload 
40 indicator. In a heterogeneous computer system, it is 
more efficient to use this workload indicator rather than 
just the local queue length as employed in the prior art. 

It was determined by the inventors herein that an 
ideal resource-sharing algorithm should have the fol- 
45 lowing characteristics: 

1) Dynamic: the load distribution should adapt to 
rapid system load changes. 

2) Decentralized: each processing computer node 
should determine, on its own, whether to process a job 

50 locally or to send the job to some other node for pro- 
cessing. There is no need for a central dispatcher. Since 
the central dispatcher is not required, the problem of a 
potential single point of failure is eliminated. 

3) Server-initiated: a good scheme should only re- 
55 quest job relocation when there are idle processors 

available to serve the relocated jobs. By using the serv- 
er-initiated approach, the danger of sender-initiated 
schemes, i.e. that all the nodes are overloaded and each 
attempts to give away jobs causing unproductive over- 
60 head which simply further saturates the already over- 
loaded system, is eliminated. 

The following is an outline of the basic SIDA algo- 
rithm in a computer language type of form: 

GIVEN- 

65 N= Number of total nodes in the heterogeneous system; 
H = Service rate of node i; 

Q = Queue length at node i, 

WHILE- 
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{[(a job completes at node i) or wakeup-timeout] and 
[Q/H ^ LOW-THRESHOLD]} 

DO 

probe Q for k-1 to N, k=i; 
identify the node, j, with the MAX(Q ); 
if Q /H^ HIGH-THRESHOLD then 
DO 

transfer 1 job from node j to node i; 

(♦job is processed at node i and the results are returned 
to node j*) 

END 
END. 

The environment of the present invention is shown in 
greater detail in FIGS. 3 and 4 while the dual mode 
algorithm logic is shown in flowchart form in FIG. 5 . 15 
Each computer node 12 ' includes a task queue 22. Point- 
ers are provided to the top of queue (TOQ), end of 
queue (EOQ), and bottom of queue (BOQ). The length 
of the active contents of the queue 22 at any time can be 
determined by subtracting TOQ from BOQ. The per- 20 
centage of the active contents of the queue 22 at any 
time is, of course, easily calculated as BOQ— TOQ- 
/EOQ— TOQ. As depicted in FIG. 3 , the operating 
system 24 , for example, can provide a service rate 26 for 
the node 12 ', i.e. the ratio (percentage) of computational 25 
usage of the node 12 ' compared with its potential. The 
SIDA as incorporated into the block labelled TASK 
ALLOCATION AND TRANSFER LOGIC 28 , uses 
the length of the local queue 22 and the local service 
rate 26 at each node 12 ' as the workload indicator 30 for 30 
that node 12 '. When a job fmishes at a node 12 ', the 
logic 28 of the node 12 ' checks the workload indicator 
30 obtained by dividing the queue length by the service 
rate. If the workload indicator 30 is less than a certain 
low threshold level, the lightly-loaded node 12 ' initiates 35 
a search for the most busy node 12 '. 

As depicted in FIG. 4 , in the system 10 ' of the present 
invention, the task allocation and transfer logic 28 of 
each node 12 ' is connected to the network 14 through a 
network interface 32 . Thus, as configured, the task 40 
allocation and transfer logic 28 of each node 12 ' can 
access the last calculated workload indicator 30 of all 
the other nodes 12 ' on the network 14 . To accomplish a 
search for the most busy node 12 ', the task allocation 
and transfer logic 28 of a node 12 ' simply requests the 45 
workload indicators 30 of all the other nodes 12 ' on the 
network 14 . If the workload indicator 30 of the most 
busy node 12 ' is above a certain high threshold, a job 
from the task queue 22 of that busy node 12 ' is trans- 
ferred to the lightly-loaded node 12 '. To prevent an idle 50 
node 12 ' from becoming isolated from the resource- 
sharing process, the wakeup timer 34 is used at each idle 
node 12 ' to periodically cause the idle node 12 ' to search 
for a job which can be transferred from a heavily loaded 
node 12 '. Thus, SIDA provides a method which bal- 55 
ances the workload among the nodes 12 ', resulting in a 
beneficial and substantial improvement in the response 
time and throughput performance of the total system. It 
should be noted that SIDA adjusts dynamically to the 
traffic load Qf each node 12 '. When the workload indi- 60 
cator of every node 12 ' is greater than a certain thresh- 
old level, the algorithm generates no overhead— an 
important and novel advantage over the prior art. 

In the prior art, such as reported in Wah’s research 
[Wah 85], the priority level of the load-balancing pro- 65 
cess is assumed to be lower than that of regular jobs, 
thereby preventing the resource-sharing procedure 
from inhibiting normal operation. In direct contrast to 
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this prior art assumption and teaching, however, for 
SIDA it is necessary and preferred to assign the highest 
priority level to the resource-sharing process. This is 
because, otherwise, the lightly-loaded nodes 12' could 
5 not receive the workload indicators 30 from the other 
nodes 12' upon which to base a decision and jobs would 
never be transferred from the heavy-loaded nodes 12' in 
a timely manner, i.e. before the heavily-loaded nodes 12' 
become lightly-loaded. 

VALIDITY TESTING RESULTS 

The present invention and its novel algorithm were 
verified by simulation modeling. A multiple processor 
heterogeneous system was considered in which the 
service rates of the nodes are not necessarily identical. 
Each node was modeled as a queueing center. For a 
particular station m, new jobs were assumed to arrive at 
rate r m . The average service rate was s m and the inter- 
wakeup time, which was the same for all nodes, was 
\/r w . The state of the system was defined to be the num- 
ber of jobs in a node, either in the queue or being served. 
The objective of the performance analysis was to deter- 
mine the effects of resource-sharing according to the 
method of the present invention on the average system 
response time. These effects are a function of the traffic 
intensity, which is defined as the ratio of the job arrival 
rate to the job service rate. 

The basic assumptions made in this performance 
study were as follows: 

a) There is only one class of tasks. The task arrival 
rate to each processor is exponentially distributed. The 
arrival rates at each processor may be different. 

b) The resource service rates are exponentially dis- 
tributed and may be different for each resource proces- 
sor. 

c) The average inter-wakeup rate t w for each idle 
node is the same and is exponentially distributed. 

d) Each job needs only one resource. 

e) The network transmission delay in propagating 
requests, probing the status of other nodes, and return- 
ing results is negligible. This assumption is valid if the 
transmission bandwidth is large compared to the traffic 
on the network. 

f) All processing overhead for probing, packing and 
unpacking data, request and result transfer are ignored. 
This assumption is valid if the processing required to 
pack and unpack the job is significantly less than the 
processing required to process the job. 

g) For simplicity, only the queue length was used as 
the workload indicator. 

h) Lightly-loaded nodes polled jobs from any heavi- 
ly-loaded node rather than selecting only the most 
heavily-loaded one. 

i) TTie low threshold is assumed to be 0. In other 
words, a node tries to poll a job from other nodes while 
it is idle. 

j) The high threshold is assumed to be 2. 

Exact analysis of the algorithm was quite complex 

since one is faced with an N-dimensional Markov chain. 
The inventors employed a simplified approximate 
model which they proved to their satisfaction to be 
quite accurate in performance prediction. The approach 
characterized the iteration between the various queues 
in terms of their steady-state occupancy probabilities. 
Iteration was then used to update estimates of the inter- 
action. To account for the load from other nodes, the 
inventors divided by the expected number of nodes that 
have more than one job. The findings were as follows. 
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When the last job is completed and the node is about 
to leave state 1, node m always probes all other nodes. 
There are three detailed procedures that take place 
during this state transition: 

1) The last local job finishes. 5 

2) Node m probes all the other nodes to check 
whether any other node has more than one job. Note, 
there is a probability that at least one of the remote 
nodes has one or more jobs waiting in queue for service. 

3) If a remote node has more than one job, the algo- 10 

rithm transfers one job from the remote node to the 
local node and the node remains in state 1. There is a 
probability that node m cannot find an overloaded node 
and therefore node m transfers to state 0 with the ser- 
vice rate s m . 15 

While in an idle state, node m “wakes up” frequently 
at the average wakeup rate r^. In all cases, after 
wakeup, this idle node should be able to poll a job from 
a remote node with the exception of the case where all 
other nodes are either idle or only have one job. 20 

The inventors saw that the transition rates, and hence 
the steady state solution, for node m depended on the 
steady state probabilities of the other queues in the 
system. Iteration was used to produce a, hopefully, 
close approximate result. From the state-transition dia- 25 
gram, the inventors calculated the steady-state probabil- 
ities, and found that the results verified the expected 
performance of the algorithm embodied in the present 
invention. 

We claim: 30 

1. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 35 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system’s re- 
sources for optimized job performance comprising: 

a) means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 40 
function of the number of jobs on the node’s queue; 

b) means at each node for transfering the node’s said 
workload value to other nodes on the network at 
the request of said other nodes; and, 

c) means at each node operable at the completion of 45 
each job, 

cl) for checking the node’s own said workload 
value, 

c2) for polling all the other nodes for their said 
workload value if the checking node’s said work- 50 
load value is below a pre-established value indi- 
cating the node as being underutilized and avail- 
able to do more jobs, 

c3) for checking the said workload values of the 
other nodes as received, and 55 

c4) for transfering a job from the queue of the other 
of the nodes having the highest said workload 
value over a pre-established value indicating said 
other of the nodes as being overburdened and 
requiring job relief to the queue of the checking 60 
node. 

2. The improvement to a distributed heterogeneous 

computer system of claim 1 and additionally comprising 
means at each node periodically operable when the 
node is idle: 65 

a) for checking the node’s own said workload value; 

b) for polling all the other nodes for their said work- 
load value if the checking node’s said workload 
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value is below a pre-established value indicating 
the node as being underutilized and available to do 
more jobs; 

c) for checking the said workload values of the other 
nodes as received; and, 

d) for transfering a job from the queue of the other of 
the nodes having the highest said workload value 
over a pre-established value indicating said other of 
the nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

3. The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 
function of the number of jobs on the node’s queue 
comprises means for dividing the number of jobs 
on the node’s queue by the service rate of the node. 

4 . The improvement to a distributed heterogeneous 
computer system of claim 1 wherein: 

a) the jobs to be performed by the nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by the 
node as a job at the highest priority level. 

5. A distributed heterogeneous computer system hav- 
ing dynamic resource allocation comprising: 

a) network means for providing a communications 
path for computer; 

b) a plurality of computer nodes each operatively 
connected through a network interface to said 
network means whereby to communicate and 
transfer data between said nodes, said nodes each 
having a queue for containing jobs to be per- 
formed; 

c) means at each said node for dynamically and peri- 
odically calculating and saving a workload value as 
a function of the number of jobs on said node’s 
queue; 

d) means at each node for transfering said node’s said 
workload value to other nodes over said network 
at the request of said other nodes; and, 

e) means at each node operable at the completion of 
each job, 

el) for checking said node’s own said workload 
value, 

e2) for polling all the other said nodes for their said 
workload value if the checking node’s said work- 
load value is below a pre-established value indi- 
cating said node as being underutilized and avail- 
able to do more jobs, 

e3) for checking the said workload values of the 
other said nodes as received, and 
e4) for transfering a job from said queue of the 
other of said nodes having the highest said work- 
load value over a pre-established value indicat- 
ing said other of said nodes as being overbur- 
dened and requiring job relief to said queue of 
the checking node. 

6. The distributed heterogeneous computer system of 
claim 5 and additionally comprising means at each said 
node periodically operable when said node is idle: 

a) for checking said node’s own said workload value; 

b) for polling all the other said nodes for their said 
workload value if the checking node’s said work- 
load value is below a pre-established value indicat- 
ing said node as being underutilized and available 
to do more jobs; 
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c) for checking the said workload values of the other 
said nodes as received; and, 

d) for transfering a job from said queue of the other of 
said nodes having the highest said workload value 
over a pre-established value indicating said other of 
said nodes as being overburdened and requiring job 
relief to said queue of the checking node. 

7. The distributed heterogeneous computer system of 
claim 5 wherein: 

said means at each node for dynamically and periodi- 
cally calculating and saving a workload value as a 
function of the number of jobs on said node’s queue 
comprises means for dividing the number of jobs 
on said node’s queue by the service rate of said 
node. 

8. The improvement to a distributed heterogeneous 
computer system of claim 5 wherein: 

a) the jobs to be performed by said nodes are assigned 
priority levels; and, 

b) said polling of all the other nodes for their said 
workload value by a node is accomplished by said 
node as a job at the highest priority level. 

9. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the method 
of operation for dynamically reallocating the system’s 
resources for optimized job performance comprising 
the steps of: 

a) at each node, dynamically and periodically calcu- 
lating and saving a workload value as a function of 
the number of jobs on the node’s queue; 

b) transfering the node’s workload value to other 
nodes on the network at the request of the other 
nodes; and, 

c) at each node at the completion of each job, 
cl) checking the node’s own workload value, 

c2) polling all the other nodes for their workload 
value if the checking node’s workload value is 
below a pre-established value indicating the 
node as being underutilized and available to do 
more jobs, 

c3) checking the workload values of the other 
nodes as received, and 

c4) transfering a job from the queue of the other of 
the nodes having the highest the workload value 
over a pre-established value indicating the other 
of the nodes as being overburdened and requir- 
ing job relief to the queue of the checking node. 

10. The method of operating a distributed heteroge- 
neous computer system of claim 9 and when the node is 
idle additionally comprising the steps of: 

checking the node’s own workload value; 

b) polling all the other nodes for their workload value 
if the checking node’s the workload value is below 
a pre-established value indicating the node as being 
underutilized and available to do more jobs; 

c) checking the workload values of the other nodes as 
received; and, 

d) transfering a job from the queue of the other of the 
nodes having the highest workload value over a 
pre-established value indicating the other of the 
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nodes as being overburdened and requiring job 
relief to the queue of the checking node. 

11. The method of operating a distributed heteroge- 
neous computer system of claim 9 wherein said step of 
5 dynamically and periodically calculating and saving a 
workload value as a function of the number of jobs on 
the node’s queue comprises the step of: 
dividing the number of jobs on the node’s queue by 
the service rate of the node. 

10 12. The method of operating a distributed heteroge- 

neous computer system of claim 9 wherein the jobs to 
be performed by the nodes are assigned priority levels 
and: 

said step of polling of all the other nodes for their 
15 workload value by a node is accomplished by the 
node as a job at the highest priority level. 

13. In a distributed heterogeneous computer system 
having a plurality of computer nodes each operatively 
connected through a network interface to a network to 
20 provide for communications and transfers of data be- 
tween the nodes and wherein the nodes each have a 
queue for containing jobs to be performed, the improve- 
ment for dynamically reallocating the system’s re- 
sources for optimized job performance comprising: 

25 a) first logic means at each node for dynamically and 
periodically calculating and saving a workload 
value as a function of the number of jobs on the 
node’s queue; 

b) second logic means at each node for transfering the 
30 node’s said workload value to other nodes on the 

network at the request of said other nodes; and, 

c) third logic means at each node operable at the 
completion of each job, said third logic means 
including, 

35 cl) means for checking the node’s own said work- 
load value, 

c2) means for polling all the other nodes for their 
said workload value if the checking node’s said 
workload value is below a pre-established value 
40 indicating the node as being underutilized and 

available to do more jobs, 
c3) means for checking the said workload values of 
the other nodes as received, and 
c4) means for transfering a job from the queue of 
45 the other of the nodes having the highest said 

workload value over a pre-established value 
indicating said other of the nodes as being over- 
burdened and requiring job relief to the queue of 
the checking node. 

50 14. The improvement to a distributed heterogeneous 

computer system of claim 13 wherein: 
said third logic means is also operable periodically 
when the node is idle. 

15. The improvement to a distributed heterogeneous 
55 computer system of claim 13 wherein: 

said first logic means at each node comprises means 
for dividing the number of jobs on the node’s queue 
by the service rate of the node. 

16. The improvement to a distributed heterogeneous 
60 computer system of claim 13 wherein the jobs to be 

performed by the nodes are assigned priority levels and 
wherein additionally: 

said means for polling of all the other nodes for their 
said workload value by a node of said third logic 
65 means includes means for accomplishing said pol- 
ling as a job at the highest priority level. 

***** 



